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WELL-POSEDNESS  AND  CONVERGENCE 
OF  SOME  REGULARIZATION  METHODS 
FOR  NONLINEAR  ILL-POSED  PROBLEMS1 


Thomas  I.  Seidman2  and  Curtis  R.  Vogel3 


ABSTRACT:  In  this  paper  we  analyze  two  regularization  methods  for  nonlinear  ill- 
posed  problems.  The  first  is  a  penalty  method  called  Tikhonov  regularization,  in 
which  one  solves  an  unconstrained  optimization  problem  while  the  second  is  based  on 
a  constrained  optimization  problem.  For  each  method  we  examine  the  well-posedness 
of  the  respective  optimization  problem.  We  then  show  strong  convergence  of  the 
regularized  ‘solutions’  to  the  true  solution.  (Note  that  this  is  well  known  for  the 
application  of  these  methods  to  linear  problems.)  In  this  analysis  we  consider  such 
factors  as  the  convergence  of  perturbed  data  to  the  true  data,  inexact  solution  of  the 
respective  optimization  problems,  and  the  choice  of  the  regularization  parameters. 


Key  words:  ill-posed,  regularization,  convergence,  approximation 
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1.  Introduction 

Consider  a  nonlinear  operator  equation 

(1.1)  F(x)=y 

where  F  :  X  — ►  1/  for  Banach  spaces  X,y.  We  assume  that  (1.1)  has  a  unique  solution 
y  but  is  ill-posed  in  the  sense  that  this  solution  does  not  depend  continuously  on  the 
data,  i.e.,  perturbed  problems 

(1.2)  F{x)  =  y 

may  have  no  solution  or,  if  (1.2)  has  a  (not  necessarily  unique)  solution  x,  one  may 
have  a  large  change  (x  —  x )  for  the  solution  corresponding  to  very  small  variations  in 
the  data:  (y'—y)  and  [i7’(-)  —  F(-)j.  Note  that  the  precise  specification  of  the  nonlinear 
operator/function  F(-)  is  considered  part  of  the  data.  A  ‘typical’  example  might  be 
a  nonlinear  Fredholm  integral  equation  of  first  kind  with  F(-)  of  the  form4: 

(1.3)  \F(x))(t)  :=  /o‘  In  [(<  _<rV+)|J/+-C(r)|,]  *  (°  S  5  1J 

which  arises  in  inverse  gravimetry  [8],  p.15.  For  other  examples,  see,  e.g.,  [2],  [4], [5]. 

The  pejorative  term  “ill-posed”  arose  from  Hadamard’s  attitude  that  such  prob¬ 
lems  could  not  be  treated  usefully  and  would  occur  only  by  erroneously  considering 
a  problem  in  an  unreasonable  context.  However,  a  variety  of  genuine  applications 
(which,  a  priori ,  are  reasonable  contexts!)  do  force  on  us  the  consideration  of  such 
problems  and,  fortunately,  techniques  have  been  found  for  useful  treatment. 

In  practical  applications,  the  data  [F,  y]  are  known  only  approximately,  with  per¬ 
turbations/uncertainties  arising  from  several  sources.  First,  measurements  are  inher¬ 
ently  inexact  —  although  one  usually  can  assume  that  this  inexactitude  can  be  arbi¬ 
trarily  reduced  albeit  with  increasing  cost  for  improved  precision.  Second,  obtaining 
a  tractable  mathematical  description  of  the  problem  introduces  so-called  ‘modelling 
error’.  Third,  inaccuracies  arise  from  computational  limitations  (e.g.,  approximation 
from  a  finite  basis,  finite-precision  computer  arithmetic,  ...);  again,  these  can  usu¬ 
ally  be  arbitrarily  improved  with  increased  cost.  In  effect,  one  must  always  deal  with 
perturbed  problems  (1.2). 

In  order  to  obtain  reasonable  approximations  to  the  ‘true’  solution  x  of  (1.1),  one 
must  solve  a  ‘regularized’  problem  constructed  from  the  perturbed  data: 

(1.4)  \F,y,p}r — ►  x 

where  p  parametrizes  the  approximation  scheme  in  a  way  which  reflects  a  priori  in¬ 
formation,  estimates  of  the  perturbation  magnitude,  etc.  Such  a  regularized  problem 

4The  value  of  the  parameter  H  >  0  is  experimentally  determined.  The  ‘true’  function  has  a  similar 
form  with  the  ‘true’  value  H 
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should  have  the  properties: 

(1.5)  (t)  well-posedness:  given  the  triple  [F,y,y j,  one  can, 

indeed,  obtain  x  in  (1.4);  further  with 
y  fixed  the  x  obtained  is  unique  and  depends  stably 
(continuously)  on[F,  y],  at  least  for  \F,y]  ‘close  to’  [F,  y]; 

(it)  convergence :  given  a  sequence  [F,  y]*  — »  [F,  y], 

appropriate  choice  of  y  =  yk  gives  Xk  — >  x  via  (1.4). 

The  condition  (i)  is  of  great  practical  importance.  Since  the  regularized  problem 
defined  by  [F,y,y\  is  to  be  solved  computationally,  this  problem  should  be  robust. 
The  condition  (ii)  is  of  obvious  theoretical  importance.  It  ensures  that  the  true 
solution  x  can  be  obtained  with  arbitrary  precision  if  one  will  ‘pay  the  price’  —  this 
is  the  same  result  as  for  so-called  ‘well-posed  problems’  although  here  one  expects  a 
much  more  rapid  increase  of  cost  with  demands  for  increased  precision  to  the  extent 
that  requests  for  more  than  moderate  accuracy  become  practically  infeasible. 

Perhaps  the  best  known  such  regularization  technique  is  the  method  of  Tikhonov 
regularization  [7]  in  which  (taking  the  abstract  parameter  y  of  (1.4)  to  be  a  number 
a  >  0)  one  obtains  solutions  xa  by  solving  the  unconstrained  minimization  problem: 

(1.6)  ||F(x)  “  V\\l  +  (z)  =  min  (x  6  X) 

where  J  (•)  is  a  non-negative  penalty  function  suitably  chosen  to  incorporate  a  priori 
information  about  the  true  solution.  (Often  one  takes  J  (x)  ||Lx  —  z |||  where  the 
choice  of  L  :  X  — *•  Z  indicates  assumptions  as  to  the  regularity  of  x ;  see  (2.2).) 

Alternatively,  one  may  choose  an  explicit  bound  /3  on  the  penalty  term  J  (x)  and 
solve  the  constrained  minimization  problem 

(1.7)  ||F(x)  -  y\\2y  =  min  (xGX,  J  (x)  <  (5) 

to  obtain  an  approximate  solution  xp  (using  (1.7)  to  define  (1.4)  with  y  replaced  by 
(3  >  0).  See  [8]  where  this  regularization  technique  is  used  to  solve  the  nonlinear 
integral  equation  given  by  using  (1.3)  in  (1.2). 

The  object  of  this  report  is  to  examine  the  abstract  techniques  (1.6),  (1.7)  under 
reasonable  hypotheses  for  which  we  can  demonstrate  the  properties  (1.5).  A  similar 
analysis  was  carried  out  in  [3]  for  the  method  of  generalized  interpolation  and  variants. 

2.  Well-Posedness  of  Regularized  Problems 

We  will  be  considering,  abstractly,  schemes  of  the  forms  (1.6)  or  (1.7)  for  which 
the  basic  ingredients  are: 
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•  the  spaces  X ,  y ; 

•  the  penalty  function  J  (•)  :  X  — ►  [0,  oo]  (but  J  ^  oo); 

•  the  form  of  F(-)  with  a  topology  for  perturbations. 

We  begin  with  assumptions  on  X ,  J  (•): 

(2.1)  (t)  X  is  a  Banach  space  and  there  is  given  a  continuous 

map  P0  :  X  — ►  Xo  =  [another  Banach  space]; 

(it)  for  each  (finite)  7  >  0  and  any  sequence  {a:*:}  in  X 
such  that  J  ( xk )  <  7  and  {Pox*}  is  bounded  in  Xo, 
there  is  a  subsequence  converging  weakly  in 

X  to  some  x  for  which  J  (x)  <  7. 

The  condition  (u)  is  essentially  lower  semicontinuity  of  J  (•)  together  with  a  coercivity 
condition  on  [J  (x)  +  ||P0x||]. 

As  a  typical  setting  for  applications,  consider  a  Hilbert  space  X,  a  closed  (densely 
defined)  linear  operator  L  :  X  — ♦  Z  =  [another  Hilbert  space],  P0  the  orthogonal 
projection:  X  — ►  Xo  :=  i/(L)  and  set 

(2.2)  J  (x)  :=  { ||Lx  —  z\\2z  for  xG  P(  L);  00  else). 

For  example,  it  might  be  convenient  to  take  X  =  ]/  =  X2(0, 1)  and  let  L  =  djdt  so  J 
effectively  penalizes  the  XT1-norm;  here  one  would  have  Xo  =  {constants}. 

Suppose  L  were  to  have  a  continuous  right  inverse  R  :  Z  — ►  .A/(Po)  so  LR,z  =  z 
for  z  £  Z.  Then,  given  any  sequence  {x*}  G  X,  we  may  write  xk  =  uk  +  vk  with 
uk  :=  PoXfc  6  i/(L)  so  J  (xk)  =  J  (vk).  If  J  ( xk )  <  7  <  00  then  xk,vk  G  P(L)  and 
we  set  zk  ~Lxk  =  Xvk  G  Z.  Note  that  R^jt  =  vk  since  LR^  —  zk  —  Lv*  gives 
(R,2fc  —  vk )  G  -V(I<)  whereas  vk,  R zk  G  i/(P0).  We  have  \\zk\\z  <  71/2  +  ||s||2  so 
there  is  a  subsequence  {k(j)}  for  which  zk^  — >■  z  (weak  convergence  in  Z)  whence 
vk[j)  —  R zk(j)  — *■  Rl  =:  v  (weak  convergence  in  i/(Po)  C  X;  with  that  v  G  P(L) 
since  J  is  lower  semicontinuous  for  this  weak  topology).  Since  we  assume  {uk}  is 
bounded,  we  may  extract  further  a  subsubsequence  for  which  also  «&'(;)  — *■  u 

(weak  convergence  in  M(L)  C  X)  so  xk>(})  — *•  (u  +  v)  =:  x.  Clearly  Lx  =  Lt)  =  z  — 
w  —  lim^fc(y)  so  J  ( xn )  =  || zk  —  z\\\  <  7  gives  J  (x)  =  || z  —  z\\2z  <  7  and  we  have 
demonstrated  (2.1)  (ii) .  Note,  also,  that  if  we  can  only  verify  the  surjectivity  of  L, 
hence  the  surjectivity  of  the  restriction  L0  of  L  to  i/(P0),  then  it  follows  from  the 
Closed  Graph  Theorem  that  a  continuous  R  exists  as  above. 

We  have  proved  the  following: 

Lemma  1:  Let  X,Z  be  Hilbert  spaces  and  L  :  X  Z>  D( L)  — >  Z  a  closed  linear 
operator  such  that  Lx  =  z  is  solvable  for  each  z  G  Z.  Then,  defining  J  (•)  by 

(2.2)  and  taking  P0  to  be  the  orthogonal  projection  on  Xo  :=  .V(li),  we  have  (2.1). 

□ 
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Clearly,  using  J  (•)  as  in  Lemma  1  for  (1.6)  or  (1.7)  gives  xa  (resp.,  xp)  in  P(L). 
Similar  arguments  permit  the  construction  of  such  J  (■)  for  more  general  X  than 
Hilbert  spaces  —  essentially,  one  needs  reflexivity  of  i/(L)  and  of  2  and  the  existence 
of  a  closed  complement  to  ,A/(L)  in  X. 

Next,  we  consider  assumptions  on  ])  and  on  the  admissible  functions  F(-): 

(2.3)  (i)  y  is  a  Banach  space;  F(-)  :  X  D  V(F)  — »  j/  with 

UeP(F)  C  X  :  J  (x)  <  oo}  nonempty; 

(it)  for  any  sequence  (a:*;}  in  D(F)  such  that 

xk  — >■  x  weakly  in  X  with  {F(xic)}  bounded  in  1/, 
we  have  x  G  V(F)  and  F(xk)  — 1  F(x)  weakly  in  ]/; 

(Hi)  F(-)  is  ‘P0  —  coercive’:  i.e.,  if  {/'(x*.)}  is  defined  and  bounded  in  y 
then  {PoZjt}  is  bounded  in  Xq. 

For  (t),  (in)  we  are,  of  course,  considering  the  same  J  (■),  P0  as  in  (2.1).  We  remark 
that  (it)  and  (m)  will  only  be  applied  to  sequences  for  which  { J  (a;*:)}  is  also  bounded 
and  so  need  be  verified  only  in  that  context.  Note  that,  for  reflexive  1/,  the  condition 
(u)  is  equivalent  to  assuming  that  the  graph  of  F(-)  is  closed  with  respect  to  sequential 
weak  convergence  in  X  X  y. 

As  a  typical  setting  for  applications,  consider  X ,  1/  reflexive  and  F(-)  of  the  form: 

(2.4)  F(x)  :=  A(z)x  +  G(z )  with  2  :=  Bx 

where  B  is  a  compact  linear  map:  X  — >  Z  =  [another  Banach  space]  and 
A(-)  :  Z  — >  B(X,y)  :=  [continuous  linear  maps:  X  — >  ]/], 

G  :  Z  —+  yw  :=  [y  topologized  by  sequential  weak  convergence] 

are  continuous  nonlinear  maps.  If  x^  — *■  x  weakly  in  X,  then  one  easily  sees  that 
Zk  :=  Bin  — »•  z  :=  Bx  strongly  in  Z  whence  A *  :=  A(^)  ->A:=  A(a)  in  B(X,y ) 
and  gk  :=  G(Zk )  — *•  g  :=  G(z)  weakly  in  y .  Now 

AfcXfc  -  Ax  =  (A*  -  A)xk  +  A(xk  -  x) 

and  the  first  term  on  the  right  goes  strongly  to  0  (as  || A*  —  A|j  — >  0  and  {x*,}  is 
bounded)  while  the  second  term  goes  weakly  to  0  in  y  as  xk  — >•  x  in  X.  Thus  one 
has  (2.3) (i),  (n);  verification  of  (2.3)  (n't)  is  likely  to  be  more  application-specific. 

Theorem  1:  Suppose  (2.1),  (2.3)  hold.  Then,  For  arbitrary  y  G  j/,  each  of  the 
problems  (1.6)  (for  arbitrary  a  >  0)  and  (1.7)  (for  large  enough  (3  >  0  )  has  a 
solution,  i.e.,  the  minimum  is  attained  in  each  case, 

PROOF  :  We  first  consider  the  constrained  problem  (1.7).  Set 


Sp  :=  {x  G  D(F)  C  X  :  J  (x)  <  (3) 
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and  note  that  Sp  must  be  nonempty  for  large  enough  (finite)  (3  by  (2.1)(«).  Suppose 
{xk}  is  any  minimizing  sequence  for  ||.F(:e)  —  y\\y  on  Sp.  Certainly  {F(xk)}  is  bounded 
in  y  so,  by  (2.3)(m),  {P0Xfe}  is  also  bounded  in  By  (2.1)(n),  it  then  follows  that 
there  is  a  subsequence  (again  denoted  by  { xk }  for  simplicity)  which  converges  weakly 
in  I,  i.e.,  xk  --  x  with  J  ( x )  <  /?.  Now,  by  (2.3) (it)  we  have  x  G  D{F)  (so  x  G  Sp) 
and  F(xk )  — i  F(x)  weakly  in  y.  By  the  lower  semicontinuity  of  the  t/-norm,  it  then 
follows  that 

|| F(x)  -  y\\y  <  liminf  \\F(xk)  -  y\\y  =  inf{||F(x)  -  y||J  :  x  G  5^}. 

Thus  the  minimum  over  Sp  is  attained  at  x. 

The  proof  for  (1.6)  is  much  the  same.  Given  any  minimizing  sequence  {a;*.}  for  the 
unconstrained  problem  (1.6),  we  necessarily  have  both  (||P(a:fc)  —  y||}  and  {J  (xfc)} 
bounded  since  each  is  non-negative  and  a  >  0.  As  above,  there  is  a  subsequence 
converging  weakly  in  X  to  some  x  and  we  again  obtain  x  G  V  ( F )  and  the  minimum 
is  attained  at  x.  □ 

This  shows  the  existence  part  of  (1.5)(t)  for  these  two  regularization  methods. 
Uniqueness  is  rather  more  difficult,  independent  of  any  difficulties  due  to  ill-posedness 
of  (1.2).  One  possible  way  to  obtain  uniqueness  (and  continuous  dependence  on  y) 
for  (1.6)  would  be  to  require  that  the  functional  \\F{x)  -  y||J  be  convex  in  x,  making 
its  derivative  (assuming  sufficient  regularity)  a  monotone  operator  so  the  regularizing 
term  can  provide  strict  monotonicity  and  so  uniqueness  of  the  minimizer.  While 
this  is  immediate  for  linear  F(-)  in  Hilbert  space  settings,  it  is  likely  to  be  somewhat 
restrictive  for  nonlinear  functions,  yet  can,  on  occasion,  prove  useful;  see  [9].  A  certain 
weaker  continuity  property  with  respect  to  perturbation  of  y  G  f/  can  be  obtained 
without  imposing  any  new  assumptions. 

Theorem  2:  Assume  (2.1),  (2.3).  Fixing  a  >  0  in  (1.6),  let  xn  be  a  minimizer  for 
(1.6),  i.e.,  for  (1.6)  with  y  =  y„  where  yn  -*  y*  in  ]/.  Then  there  is  some  subsequence 
{x„(fc)}  converging  weakly  in  X  to  some  minimizer  xt  for  (1.6),  i.e.,  (1.6)  with  y  =  y«; 
if  (1.6)  has  a  unique  minimizer,  then  xn  — 1 -  xt.  A  similar  result  holds  for  (1.7). 

PROOF  :  For  (1.6),  set 

Q(*,y)  :=  \\Fix)  -  y\\y  +  aJ  (■ x ) 

for  x  e  P(F)  C  X  and  y  G  J/.  We  have,  for  any  e  >  0,  existence  of  Ct  such  that 
(a  +  b)2  <  [1  +  e}a2  +  Ceb2  (a,b  G  1R) 

whence,  as  aJ  >  0,  one  has 

(2.5)  Q{x,y)  <{l  +  e)Q{x,y')  +  Ce\\y-y'\\l  (x  G  V{F)\  y,y'  G  ]/). 
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From  this  we  obtain,  with  x0  some  minimizer  for  (1.6), 

Q{xn,y*)  <  (l  +  e)Q(xn,y„)  +  C,\\yn-  y*\\y  by  (2.5) 

<  (1  +  e)Q(x0,yn)  +  Ce\\yn  -  y*||y  by  minimality  for  xn 

<  (1  +  e)2Q(x0,y*)  +  (2  +  e)C€\\yn  -  y*||y  by  (2.5). 

The  right  hand  side  can  be  made  arbitrarily  close  to  the  minimum  Q(x0,y *)  by  first 
taking  e  >  0  small  and  then  ||yn  —  y* j|  small  enough.  Hence  {xrt}  is  a  minimizing 
sequence  for  (1.6).  The  existence  of  the  subsequence  xn^)  — *■  x*  then  follows  as  in 
the  proof  of  Theorem  1  which  also  shows  that  x+  minimizes  (1.6).  Convergence  of 
the  full  sequence  when  the  limit  of  convergent  subsequences  is  unique  is  a  standard 
argument.  The  proof  for  (1.7)  is  essentially  the  same.  □ 

3.  Convergence  of  Regularized  Solutions 

For  our  convergence  analysis  we  assume  a  sequence  of  perturbed  problems 

(3.1)  Fk(x)  =  yk 

with  data  \Fk,yk]  converging,  in  a  sense  to  be  made  precise  below,  to  the  ‘true’  data 
\F,y]  of  (1.1).  We  do  not,  of  course,  consider  (3.1)  directly  (i.e.,  as  an  equation  de¬ 
termining  possible  solutions  xk)  but  wish  to  use  the  data  to  consider  a  corresponding 
sequence  of  regularized  problems  —  either  using  Tikhonov  regularization  as  in  (1.6) 
or  constrained  minimization  as  in  (1.7).  From  the  regularized  problems  we  obtain  a 
sequence  {x*.}  and,  under  reasonable  assumptions  on  the  problem  and  our  approaches, 
we  wish  to  demonstrate  convergence  to  the  ‘true  solution’  x: 

(3.2)  xk  — >  x  strongly  in  X . 

For  the  method  of  Tikhonov  regularization,  we  modify5  (1.6)  slightly  and  consider 
the  unconstrained  approximate  minimization  problem: 

(3.3)  ||Tfe(x)  —  yk\\y  +  akJ  (x)  <  inf  +6k  {xGP(Fk)  C  X) 

where  6k  >  0  is  a  small  parameter.  (We  will  later  take  ak  — >  0  and  Sk  — >  0  as 

[ Fk , yk ]  — y  [F,y]).  Similarly,  we  modify6  the  regularization  given  through  (1.7)  and 
consider  the  approximate  problem: 

(3.4)  ||Ffc(x)  -  Vk\\y  <  'nf  (xGP(Ffc)  C  I;  J  (x)  <  (3k). 

5This  modification  reflects  computational  reality:  one  never  actually  expects  to  obtain  the  exact 
minimum  (even  when  it  is  attained)  but  can  find  x  giving  values  arbitrarily  close  to  that.  A  side 
effect  of  this  modification  is  that  (3.3)  makes  sense,  theoretically  and  computationally,  even  when 
the  minimum  may  not  be  attained;  this  will  permit  us  to  relax  somewhat  the  restrictions  on  F(-) 
which  would  be  imposed  by  (2.3). 

GThe  significance  of  the  modification  (replacing  “=  min”  by  “<  inf  +<5fc”)  is  the  same  here  as  indicated 
above  for  (1.6),  (3.3). 
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We  assume  P{Fk)  is  nonempty  and  0k  is  large  enough  that 

(3.5)  Sfc  :=  {x  e  P (Fk)  C  X  :  J  (x)  <  f3k}  +  <j>  {k  =  1,2,...) 

so  (approximate)  minimization  over  Sk  is  meaningful  in  (3.4). 

Most  applications  can  be  formulated  so  Pk  :=  P{Fk)  =  P{F)  —  ■  D*,  independent 
of  k  —  indeed,  usually  with  Pk  —  P*  =  X .  Occasionally,  however,  it  is  convenient  to 
incorporate  partly  in  the  specification  of  Dk  a  computational  implementation  of  Fk 
defined,  e.g.,  only  for  ‘mesh  functions’.  For  simplicity7  we  take  X  as  fixed  and  embed 
Dk  (e.g.,  as  a  subspace)  in  X  with  suitable  approximation  properties  familiar  from 
the  numerical  analysis  literature;  similarly,  we  assume  the  codomain  of  each  Fk(-)  is 
(embedded  in)  the  fixed  Banach  space  i/.  Note  that  in  applications  the  approximating 
nature  of  Fk{-)  as  a  perturbation  of  F(-)  includes  the  treatment  of  modelling  errors, 
the  specification  of  relevant  parameter  values  (in  some  general  form  of  F(-)  —  e.g., 
as  the  value  of  H  in  (1.3)),  and  also  the  nature  of  the  computational  implementation 
to  be  used. 

The  relevant  notion  of  convergence,  Fk(-)  — ►  F(-),  is  most  easily  viewed  as  a 
geometric  notion  of  convergence  for  the  graphs,  considered  as  subsets  of  X  X  i/. 
Specializing  the  definition  from  [3]  to  the  present  case,  we  have: 

DEFINITION  :  We  say  u{Fk(-)}  is  graph-subconvergent  to  F(‘)n  if: 

(3.6)  Given  any  subsequence  {k(j)}  and  a  sequence  {x;}  in  X  such  that 

Xj  G  D(Fk(j))]  Xj  — 1 ■  x  weakly  in  X  and 
y,  :=  Fk(j)(xj)  — >  y  strongly  in  1/, 
we  have  x  €  P{F)  and  y  —  F(x). 

If,  in  addition,  we  have: 

(3.7)  For  each  x  €E  D(F)  there  is  some  sequence  {xk}  in  X 
such  that  Xk  G  P{Fk)  with  xk  — »  x  strongly  in  X 
and  Fk(xk)  ->  F(x)  strongly  in  1 1 

then  we  say  “{F*,(-)}  is  graph-convergent  to  F(-)” . 

See  [3],  [4]  for  further  discussion  and  some  examples.  We  remark  here  that  (3.7) 
will  be  needed  only  for  x  =  x,  the  (unique)  solution  of  (1.1),  and  need  only  be  verified 
for  x. 

We  will  continue  to  impose  (2.1),  as  before,  but  now  adjoin  an  additional  assump¬ 
tion  regarding  the  penalty  function  J  (•): 

(3.8)  For  any  weakly  convergent  sequence  {xk}  in  X  (  so  xk  — 1  x t), 

if  J  ( xk )  — > >  J  (x*)  <  oo  then  the  sequence  {xk}  is  strongly  convergent  in  X. 

Generalization  of  this  framework  is  possible  but,  for  our  present  purposes,  seems  an  unwarranted 
complication. 
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It  is  not  difficult  to  relate  this  to  a  geometric  condition  on  the  level  surfaces  of  J  (•): 

(3.8)  is  implied  by  the  condition: 

(3.9)  Given  x  £  X  with  J  (x)  <  oo  and  given  e  >  0,  there  is  a  6  >  0 
and  a  ‘cylinder  set’  C  C  X  of  the  form 

C  :=  {x  £  X  :  1(6, x  —  x)\  <6  for  k  =  1, ... ,  K } 

(with  each  6  £  X*)  such  that  x  £  C  and 
C  n  {x  :  |  J  (x)  -  J  (x)|  <  <5}  C  Be[x)  :=  {x  £  X  :  ||x  -  x||i  <  e}. 


It  is  well  known  that,  in  any  uniformly  convex  Banach  space  X,  the  norm  (or  any 
strictly  increasing  function  of  it)  has  this  property  (3.9)  —  indeed,  one  can  take  K  =  1 
and  6  to  be  the  support  functional  at  x  to  {x  £  X  :  ||x||  <  ||x||}  in  constructing  C 
—  and  so  J  (x)  ||x||j  satisfies  (3.8). 

Returning  to  (2.2)  in  the  setting  of  Lemma  1,  we  note  that  P  (L)  becomes  a  Hilbert 
space  (a  fortiori  uniformly  convex)  under  the  norm  [||Pox||2  +  J  (x)]1/2;  the  topologies 
of  weak  convergence  are  compatible.  If  Xo  :=  i/(L)  is  finite  dimensional  then  weak 
convergence  in  X  already  gives  strong  convergence  in  Xo  and  (3.8)  is  easily  verified; 
similar  considerations  apply  to  more  general  convex  penalty  functions. 

In  view  of  the  modifications  of  (1.6),  (1.7)  and  our  somewhat  different  present 
perspective,  we  discard  the  earlier  assumptions  (2.3)  as  possibly  applying  to  each  of 
the  Fk(-).  Instead,  we  introduce  new  assumptions  on  the  limit  problem  (1.1)  and  on 
the  sequence  {Fk{-)}: 


(3.10) 


(t)  y  is  a  Banach  space;  F(-)  :  X  D  V[F)  - — >  l/; 

Fk{-)  :  X  D  P{Fk)  ^  y  for  *  =  1,2,...;  {Fk{-)}  is 
graph-subconvergent  to  F(-)  in  the  sense  of  (3.6); 

(u)  there  is  some  sequence  {x^}  with  xk  £  P[Fk) 
such  that  xk  —*  x  strongly  in  X,  J  (x*)  — ►  (3, 
and  Fk(xk )  -»•  y  strongly  in  j/; 

(in)  the  sequence  (P*;(')}  is  ‘Po-coercive’:  if  {Fk(xk)} 
is  defined  and  bounded  in  y  then  {Pox*}  is 
bounded  in  Xq. 


Typically,  in  applications  one  has  D(Fk)  D  D(F)  (e.g.,  D(Fk)  —  X  )  and  (3. 10) (n’t) 
is  automatic:  one  simply  takes  xk  =  x  for  k  =  1,2,....  Note  that  the  earlier  set  of 
assumptions  (2.3)  just  corresponds  to  (3.10)  with  Fk(-)  =  F(-)  =  F(- ). 

We  are  now  ready  to  demonstrate,  separately,  the  convergence  (3.2)  for  each  of 
the  regularization  techniques:  (3.3)  and  (3.4).  We  consider  first  the  approach  by 
constrained  minimization. 
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Theorem  3:  Assume  (2.1),  (3.8).  Assume  [F,y j  is  such  that  (1.1)  has  a  unique 
solution  x  €  X;  assume  (3.10)  and  that  yk  — >  y  strongly  in  y.  Let  f3k  ft  with 
each  1 3k  large  enough  to  give  (3.5);  this  is  always  possible.  Let  {xk}  be  any  sequence 
satisfying  (3.4)  for  each  k  with  0  <  Sk  — >  0;  such  sequences  always  exist.  Then  one 
has  strong  convergence:  xk  — >  x  strongly  in  X . 

PROOF  :  By  (3.10)(ii)  one  need  only  take,  e.g.,  f3k  >  P(x k)  -+  P  to  have  (3.5);  the 
form  of  (3.4)  with  0  <  6  then  ensures  existence8  of  xk  satisfying  (3.4)  for  each  k. 
Clearly  {/ 3k }  is  bounded  so  { J  (.r*)}  is  bounded.  Since  (3.4)  gives 

ll**(*fc)  -  VkWy  <  (||**(**)  -  y\\y  +  ||y  -  yk\\y?  +  6k 

and  the  properties  of  {5*:},  {j/fc}?  {^fc}  niake  the  right  hand  side  go  to  0,  we  see  that 
{Fk(x ft)}  is  not  only  bounded  but,  also,  Fk(xk)  — >  y  strongly  in  y .  By  (3.10)(iti),  we 
then  have  {P0Xjfe}  bounded  in  X0  as  well.  The  assumption  (2.1) (ii)  then  applies  and 
we  have  existence  of  a  subsequence  {xic(j}}  such  that  Xj  :=  xk(j)  — i  x  weakly  in  X  for 
some  x.  Applying  (3.10)  (i),  the  definition  (3.6)  of  graph-subconvergence  ensures  that 
x  €  D(F)  and  F(x)  =  lim Fk^(xj)  =  y.  The  assumed  uniqueness  of  the  solution  of 
(1.1)  then  implies  that  x  :=  w  —  lim  xk^j  must  be  x  and  so,  by  a  standard  argument, 
that  Xj  — *  x  weakly  in  X. 

Now  suppose  lim  inf  J  ( xk )  <  j3.  We  could  then  find  a  <  0  with  J  ( xk )  <  a  for 
large  k.  Applying  (2.1) (it)  to  {xk  :  k  >  K }  gives  a  subsequence  {xfc(;)}  such  that 
xk(j)  &  weakly  in  X  with  J  (x)  <  a  <  ft.  Since  we  have  already  shown  xk  — 1  x, 
this  is  a  contradiction.  Thus,  lim  inf  J  ( xk )  >  j3.  On  the  other  hand,  J  (xk)  <  Pk  by 
(3.4)  and  (3k  — >  (3  so  lim  sup  J  ( xk )  <  /?.  This  shows  J  (xk)  —>/?:=  J  (x)  and  (3.8) 
gives  the  desired  strong  convergence.  □ 

The  argument  in  the  case  of  Tikhonov  regularization  is  rather  similar.  In  this 
case  we  will  need  a  condition  on  the  sequence  {a*}  of  regularizing  parameters  —  it 
must  go  to  0  but  must  do  so  ‘slowly  enough’.  We  will  be  assuming  yk  —■ ►  y  in  y  and 
existence  of  a  sequence  {x*,}  as  in  (3.10) (it).  Set 

(3.11)  h  :=  \\Fk{xk)  ~  yjfeily  ,  7 *  :=  J  (x*). 

We  will  require 

(3.12)  (i)  0  <  ak  — ^  0; 

(u)  v2k/ak—>  0 

and  remark  here  that  such  sequences  {ak}  always  exist  since  i/k  — >  0,  noting  that 
Fk{xk)  -*  V  by  (3.10) (it)  and  yk  y. 

8At  this  point  we  remark  that  the  mere  existence  of  such  xk  is  not  really  at  issue  —  after  all,  we  were 
willing  to  assume  (3.10)(m).  The  point  is  that,  given  (3.5),  we  can  expect  a  feasible  implementation 
(for  each  k )  enabling  us  actually  to  compute  explicitly  an  xk  satisfying  (3.4).  It  is  the  convergence 
to  x  of  this  computed  sequence  which  is  the  real  point  of  this  theorem. 


11 


Theorem  4:  Assume  (2.1),  (3.8).  Assume  [F,y]  is  such  that  (l.l)  has  a  unique 
solution  x  £  X ;  assume  (3.10)  and  that  yk  -+  y  strongly  in  \j .  Let  a k  satisfy  (3.12) 
and  take  0  <  8k  with  Sk/ak  — »  0.  Then  for  any  sequence  {re*}  satisfying  (3.3)  we  have 
strong  convergence:  xk  — >  x  strongly  in  X . 

PROOF  :  The  form  of  (3.3)  with  ak  >  0  ensures  existence  of  a  solution  xk  for  each 
(3.3).  As  above  for  (3.4),  we  remark  that  we  are  interested  in  the  particular  xk 
obtained  by  some  explicit  computational  procedure;  for  this  {re*}  set 

:=  || Fk(xk)  -  yk\\y  ,^k  :=  J  (xk). 

The  minimality  property  (3.3)  gives 

(3.13)  i^l  +  OL^k  <u\  +  oi-k'lk  +  8k. 

The  right  side  goes  to  0  as  vk  — >  0,  ak  — >  0,  <5*  — *  0,  ^k  — >  /3  ;  hence,  as  in  the 
previous  proof,  we  have  {Fk(xk)}  bounded  with  I\(xk)  — ►  y  strongly  in  y  and,  again 
by  (3. 10) (n't),  {P0:t;fc}  is  bounded  in  X0  as  well. 

To  bound  {^k},  we  observe  that  the  estimate  (3.10)  can  be  divided  by  ak  to  give 

Ik  <  (^fc/ajfe)  +  Ik  +  [6k/ak). 

We  have  assumed  6k/ak  — *  0  and,  by  (3.12) (u),  that  ul/ak  — >  0;  by  (3.10) (tt)  we 
have  — *  /?.  Thus,  we  have 

(3.14)  limsup  J  (xk)  <  (3. 

In  particular,  {J  (a:*)}  is  bounded  and  (2.l)(u)  applies  to  give  existence  of  a  sub¬ 
sequence  {arfc(j') }  with  Xfc(j)  — *•  x  weakly  in  X.  The  graph-subconvergence  (3.10) (t) 
ensures  that  x  £  V(F)  with  F(x)  =  lim  Fk^ (xk^)  —  y  (since  vk(j)  — >  0  by  (3.12)). 

The  argument  that  lim  inf  J  ( xk )  >  ft  is  exactly  as  in  the  proof  of  Theorem  3  and, 
with  (3.11),  again  gives  J  ( xk )  — »  j3  :=  J  (z).  Again,  application  of  (3.8)  gives  the 
desired  strong  convergence  (3.2).  □ 


4.  An  Exemplary  Application 

In  this  section  we  will  discuss  an  important  ill-posed  inverse  problem  arising  in 
‘remote  sensing’  of  the  atmosphere.  Here  one  wishes  to  estimate  the  atmospheric 
temperature  profile  from  infrared  radiation  measurements  taken  by  a  satellite  at  the 
top  of  the  atmosphere. 

The  relationship  between  radiative  intensity  I  and  temperature  T  is  modelled  by 
the  nonlinear  integral  equation 

I(v)  =  fsW,T{a))  +  j  f(v,p,T(p))dp 

a 


(4.1) 
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Here  u  is  the  wavenumber  and  p  denotes  atmospheric  pressure,  a  monotonically  de¬ 
creasing  function  of  height  above  the  surface.  The  function  fs  models  the  contribution 
from  the  surface  and  is  continuous  in  both  its  arguments.  The  function  /  is  nonlinear 
in  T  and  depends  smoothly  on  all  three  arguments9.  Making  the  identifications 

V '=/(•),  x  —  T(-),  and  F(x)  =  fs(-,x(a))  +  J  f(-,p,x(p))  dp 

a 

gives  a  nonlinear  operator  equation:  F(x )  =  y.  In  practice,  measurements  of  y  (i.e., 
of  /(•))  are  subject  to  instrument  error.  Moreover,  simplifying  assumptions  are  used 
to  derive  the  model  equation  (4.1).  Thus  we  must  deal  with  a  perturbation  of  the 
underlying  ‘true’  problem  F(x)  =  y. 

Since  we  expect  atmospheric  temperature  to  vary  smoothly  with  height,  we  assume 
that  x  lies  in  X  :=  H1(a,b).  We  will  make  the  (physically  reasonable)  assumption 
that  the  measurements  y  lie  in  y  \=  L2(c,d).  Under  these  assumptions,  the  problem: 
F(x)  =  y  is  ill-posed.  From  the  continuity  of  fs,  the  smoothness  of  the  kernel  /,  and 
the  fact  that  weak  convergence  in  Hl\a,  6]  implies  strong  convergence  in  C[a,b],  one 
can  easily  show  that  the  operator  F  :  X  —>  y  is  weakly  continuous.  If  we  define  the 
penalty  functional  J  (x)  as  in  (2.2)  with  2  :=  L2(a,b )  and  Lx  :=  dx/dt,  then  the 
results  of  the  previous  sections  hold.  Numerical  results  using  the  method  of  Tikhonov 
regularization  (1.6)  have  been  obtained  for  this  problem  by  O’Sullivan  and  Wahba 

[2]- 

Now  let  us  consider  a  modification  of  this.  In  practice,  at  a  certain  height  the  gra¬ 
dient  of  the  temperature  of  the  atmosphere  may  vary  quite  rapidly:  the  dependence 
of  temperature  on  height  may  still  be  smooth  when  viewed  on  a  microscopic  scale, 
but  on  a  macroscopic  scale,  it  is  convenient  to  assume  a  jump  in  the  derivative  of  x. 
Both  the  location  r  and  the  magnitude  m  of  this  jump  are  unknown,  and  both  are 
to  be  estimated.  In  this  case  we  may  consider  the  ‘parametrization’: 

(4.2)  x(t)  =  u(t)  +  m  ■  (t  —  r)H(t  —  t), 

where  u  £  H2(a,b )  and  //(•)  denotes  the  Heaviside  function.  The  inverse  problem 
may  now  be  reformulated  as  solving  F(u,m,r)  =  y  for  the  triple  [u,m,r]  in  X'  :— 
H2(a,b )  x  1R  x  [a,  b],  where 

F  :  X'  — >  L2(c,d)  :  [u,m,r]  t-y  y 


is  given  by 


(4.3)  F{u,m,r)  =  fs(-,u(a))  +  J  f(-,p,u(p)+m-(t-T)H(p-T))dp. 

a 


9Details,  including  the  exact  form  of  fs  and  of  the  kernel  f(v,T,p),  appear  in  [l] . 
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A  penalty  functional  J  ( x )  such  as  (2.2)  is  no  longer  appropriate.  Instead,  we  may 
consider10 

J  (u, m, r)  =  |M|Jf2(ai6)  +  m2  +  (r  -  a)2. 

The  results  of  the  previous  sections  now  apply  to  this  example.  To  obtain  (2.1),  note 
that  when  {J  (uk,  mk,  rk)}  is  bounded  we  can  extract  a  subsequence  {&(.?)}  for  which 
{ujfe(y)}  converges  weakly  to  some  u  in  II2  (a,  b),  {mk^}  converges  to  some  m  G  JR, 
and  converges  to  some  t  £  [a, 6].  To  obtain  (2.3)(u),  consider 

(4.4)  xk(t)  :=  uk(t)  +  mk-(t-Tk)H(t-Tk), 

x(t)  :=  u(t)  +  m  •  (t  —  r)H(t  —  t). 

If  {uk}  converges  weakly  to  u  in  II2[a,b],  mk  — >  m,  and  rk  —>•  r,  we  can  easily  show 
that  {xfc}  converges  to  x  in  C[a,  b]  (uniform  convergence)  and,  from  the  continuity  of 
fs  and  the  smoothness  of  /,  show  that  {^(a:*.})  converges  to  F(x)  in  y  =  L2(c,d). 
Theorem  1  now  applies  and,  subject  to  the  existence/uniqueness  assumptions  for  the 
solution11,  so  does  Theorem  2.  Similarly,  one  may  verify  conditions  (3.8)  and  (3.10) 
to  apply  Theorems  3  and  4. 

It  is  instructive  to  consider  the  contingency  that  the  true  solution  does  not,  in  fact, 
involve  a  jump.  The  representation  above  will  cover  this  case  by  taking  m  =  0  but  we 
note  that  this  leaves  r  indeterminate  and  so  introduces  a  spurious  nonuniqueness12 
for  the  ‘true’  solution.  Looking  more  carefully  at,  e.g.,  the  proof  of  Theorem  3,  we 
observe  that  the  uniqueness  was  used  only  to  ensure  that  the  convergent  subsequences 
extracted  all  converged  to  the  same  limit  —  the  true  solution.  That  would  remain 
the  case  here  even  though  different  such  subsequences  might  involve  convergence  to 
different  representations  of  that  solution. 

Numerical  implementation  of  these  ideas  is  straightforward;  see  Shiau  [5],  [6]. 
Extension  of  this  analysis  to  the  possibility  of  several  jumps  (with  a  bound  on 
the  number)  is  immediate.  The  extension  of  these  ideas  for  solutions  in  two  or  three 
variables  appears  likely. 
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